Webspace query formulation: an overview

نویسندگان

  • Roelof van Zwol
  • Peter M.G. Apers
چکیده

To find information on the World-Wide Web (WWW), two approaches are generally followed. Browsing the web from a specific starting point, or web-site map, is called search by divergence. The second approach, search by convergence, is followed when using a search engine. Most search engines use a information retrieval strategy, which requires that the user supplies some keywords to find the relevant information. Due to the diversity and unstructuredness of the WWW, both approaches offer only limited query formulation techniques to find the relevant information. When focusing on smaller domains of the Internet, still large collections of documents have to be dealt with, which are presented on a single web-site or Intranet. There the content is more related and structured, which allows us to apply database techniques to the web. The Webspace Method aims at using DB techniques to model and query such document collections. A semantical level of abstraction is obtained, by describing the content of the documents with some high-level concepts, defined in an object-oriented schema. This allows us to bring the power of query formulation as known within a database environment to the web. At the same time, we focus on the integration with Information Retrieval, which allows us to formulate complex content-based queries over a collection of web-based documents, containing various types of multimedia. After an introduction into the Webspace Method, the focus in this article will be on the formulation of complex queries over a collection of related multimedia documents, also called a webspace. For that purpose the Webspace Search Engine is built, which combines search by both divergence and convergence to formulate the query, using a graphical representation of the webspace schema. Under the hood, the Webspace Search Engine uses the Data eXchange Language (DXL) to gather the requested information. We will explain the DXL’s framework for data exchange, and discuss how it is integrated into the Webspace Search Engine. Furthermore, we will show by some examples how, with help of the Data eXchange Language (DXL), specific parts of documents can be retrieved and integrated into the result of the query, based on the concepts defined in the webspace schema. This in contrast to the average search engine, which just delivers a document’s URL.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling the Webspace of an Intranet

Searching the internet using the currently available search engines is not satisfactory. The techniques used there focus on the extraction of relevant information directly from the documents available on the web. We introduce a new approach, which aims at describing the content of a webspace, formed by a collection of related documents, instead of looking at the single documents. By identifying...

متن کامل

Using Webspaces to Model Document Collections on the Web

Due to the unstructured character of data on the web it is hard to find specific information when surfing over the web. Search engines can only rely their results on IR techniques available, and most of the time they lack the desired power in query formulation. Modelling data on the web, as if it was designed for use within databases, provides us with the necessary basis for enhancing the query...

متن کامل

Modelling the webspace of an intranet - Web Information Systems Engineering, 2000. Proceedings of the First International Conference on

Searching the internet using the currently available search engines is not satisfactory. The techniques used there focus on the extraction of relevant information directly from the documents available on the web. We introduce a new approach, which aims ut describing the content of a webspace, formed by a collection of related documents, instead of looking at the single documents. By identifying...

متن کامل

DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and...

متن کامل

NITELIGHT: A Graphical Editor for SPARQL Queries

Query formulation is a key aspect of information retrieval, contributing to both the efficiency and usability of many semantic applications. A number of query languages, such as SPARQL, have been developed for the Semantic Web; however, there are, as yet, few tools to support end users with respect to the creation and editing of semantic queries. In this paper we present NITELIGHT, a graphical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002